199 research outputs found
Many Task Learning with Task Routing
Typical multi-task learning (MTL) methods rely on architectural adjustments
and a large trainable parameter set to jointly optimize over several tasks.
However, when the number of tasks increases so do the complexity of the
architectural adjustments and resource requirements. In this paper, we
introduce a method which applies a conditional feature-wise transformation over
the convolutional activations that enables a model to successfully perform a
large number of tasks. To distinguish from regular MTL, we introduce Many Task
Learning (MaTL) as a special case of MTL where more than 20 tasks are performed
by a single model. Our method dubbed Task Routing (TR) is encapsulated in a
layer we call the Task Routing Layer (TRL), which applied in an MaTL scenario
successfully fits hundreds of classification tasks in one model. We evaluate
our method on 5 datasets against strong baselines and state-of-the-art
approaches.Comment: 8 Pages, 5 Figures, 2 Table
HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities
Multimodal datasets contain an enormous amount of relational information,
which grows exponentially with the introduction of new modalities. Learning
representations in such a scenario is inherently complex due to the presence of
multiple heterogeneous information channels. These channels can encode both (a)
inter-relations between the items of different modalities and (b)
intra-relations between the items of the same modality. Encoding multimedia
items into a continuous low-dimensional semantic space such that both types of
relations are captured and preserved is extremely challenging, especially if
the goal is a unified end-to-end learning framework. The two key challenges
that need to be addressed are: 1) the framework must be able to merge complex
intra and inter relations without losing any valuable information and 2) the
learning model should be invariant to the addition of new and potentially very
different modalities. In this paper, we propose a flexible framework which can
scale to data streams from many modalities. To that end we introduce a
hypergraph-based model for data representation and deploy Graph Convolutional
Networks to fuse relational information within and across modalities. Our
approach provides an efficient solution for distributing otherwise extremely
computationally expensive or even unfeasible training processes across
multiple-GPUs, without any sacrifices in accuracy. Moreover, adding new
modalities to our model requires only an additional GPU unit keeping the
computational time unchanged, which brings representation learning to truly
multimodal datasets. We demonstrate the feasibility of our approach in the
experiments on multimedia datasets featuring second, third and fourth order
relations
BERT for Evidence Retrieval and Claim Verification
Motivated by the promising performance of pre-trained language models, we
investigate BERT in an evidence retrieval and claim verification pipeline for
the FEVER fact extraction and verification challenge. To this end, we propose
to use two BERT models, one for retrieving potential evidence sentences
supporting or rejecting claims, and another for verifying claims based on the
predicted evidence sets. To train the BERT retrieval system, we use pointwise
and pairwise loss functions, and examine the effect of hard negative mining. A
second BERT model is trained to classify the samples as supported, refuted, and
not enough information. Our system achieves a new state of the art recall of
87.1 for retrieving top five sentences out of the FEVER documents consisting of
50K Wikipedia pages, and scores second in the official leaderboard with the
FEVER score of 69.7
Detecting CNN-Generated Facial Images in Real-World Scenarios
Artificial, CNN-generated images are now of such high quality that humans
have trouble distinguishing them from real images. Several algorithmic
detection methods have been proposed, but these appear to generalize poorly to
data from unknown sources, making them infeasible for real-world scenarios. In
this work, we present a framework for evaluating detection methods under
real-world conditions, consisting of cross-model, cross-data, and
post-processing evaluation, and we evaluate state-of-the-art detection methods
using the proposed framework. Furthermore, we examine the usefulness of
commonly used image pre-processing methods. Lastly, we evaluate human
performance on detecting CNN-generated images, along with factors that
influence this performance, by conducting an online survey. Our results suggest
that CNN-based detection methods are not yet robust enough to be used in
real-world scenarios.Comment: Accepted to the workshop on Media Forensics at CVPR 202
4-Connected Shift Residual Networks
The shift operation was recently introduced as an alternative to spatial
convolutions. The operation moves subsets of activations horizontally and/or
vertically. Spatial convolutions are then replaced with shift operations
followed by point-wise convolutions, significantly reducing computational
costs. In this work, we investigate how shifts should best be applied to high
accuracy CNNs. We apply shifts of two different neighbourhood groups to ResNet
on ImageNet: the originally introduced 8-connected (8C) neighbourhood shift and
the less well studied 4-connected (4C) neighbourhood shift. We find that when
replacing ResNet's spatial convolutions with shifts, both shift neighbourhoods
give equal ImageNet accuracy, showing the sufficiency of small neighbourhoods
for large images. Interestingly, when incorporating shifts to all point-wise
convolutions in residual networks, 4-connected shifts outperform 8-connected
shifts. Such a 4-connected shift setup gives the same accuracy as full residual
networks while reducing the number of parameters and FLOPs by over 40%. We then
highlight that without spatial convolutions, ResNet's downsampling/upsampling
bottleneck channel structure is no longer needed. We show a new, 4C shift-based
residual network, much shorter than the original ResNet yet with a higher
accuracy for the same computational cost. This network is the highest accuracy
shift-based network yet shown, demonstrating the potential of shifting in deep
neural networks.Comment: ICCV Neural Architects Workshop 201
Episodic Multi-Task Learning with Heterogeneous Neural Processes
This paper focuses on the data-insufficiency problem in multi-task learning
within an episodic training setup. Specifically, we explore the potential of
heterogeneous information across tasks and meta-knowledge among episodes to
effectively tackle each task with limited data. Existing meta-learning methods
often fail to take advantage of crucial heterogeneous information in a single
episode, while multi-task learning models neglect reusing experience from
earlier episodes. To address the problem of insufficient data, we develop
Heterogeneous Neural Processes (HNPs) for the episodic multi-task setup. Within
the framework of hierarchical Bayes, HNPs effectively capitalize on prior
experiences as meta-knowledge and capture task-relatedness among heterogeneous
tasks, mitigating data-insufficiency. Meanwhile, transformer-structured
inference modules are designed to enable efficient inferences toward
meta-knowledge and task-relatedness. In this way, HNPs can learn more powerful
functional priors for adapting to novel heterogeneous tasks in each meta-test
episode. Experimental results show the superior performance of the proposed
HNPs over typical baselines, and ablation studies verify the effectiveness of
the designed inference modules.Comment: 28 pages, spotlight of NeurIPS 202
- …